CGI Web Applications with Python, Part One

- - - - - - - - - - - -

By Michael Foord  | 2004-03-15

print

One of today's hottest topics is "web applications." Unlike traditional "shrinkwrapped" "executable" software that runs locally on a desktop machine, a web application runs on a centralized server and delivers its features via the Internet, usually via HTTP and a common web browser. Web applications are increasingly popular, because they can be accessed readily -- just point the browser to a URL -- and can be accessed simultaneously by a number of users. Some web applications provide e-commerce (think eBay) some provide entertainment (such as Yahoo! Games), and others, such as Salesforce.com, manage enterprise information.

While Java, Perl, and PHP are often lauded as ideal programming languages for web application development, Python is just as capable. Indeed, Python is perfectly suited to delivering dynamic content across the Internet.

The simplest way to create web applications with Python is to use the Common Gateway Interface (CGI) 1. CGI is just another protocol: it describes how to connect clients to web applications.

Normally, when you fetch static content from a web server, the server finds the file 2 that you're requesting and sends it back in a response. For example, a request for http://www.example.com/contact.html returns the HTML page contact.html. However, if the request refers to a CGI script, then instead of returning the script (as content), the script runs and the output of the script is sent in the response. CGI scripts generate content dynamically in response to a request (and its parameters, as you'll see shortly).

Once you understand how CGI works, producing dynamic content is as simple as using the print statement. And contrary to its reputation, CGI is not necessarily slow. Even though the Python interpreter launches for each and every script invocation, these days, you should try CGI before choosing a more complex web application framework. 3

Let's dive into CGI programming with Python. This first of two parts explains the basics of CGI, describes how HTML forms are sent, and explains how to process form input. The next article provides an example application and covers more advanced CGI topics, such as CGI environment variables, HTML templating, and Unicode.

All code in this article is intended to work with Python 2.2 and beyond.

Headers and Line Endings

Half the battle of writing a web application is returning the right headers in response to a request. Sending valid headers isn't just important for the receiving client -- if your program doesn't emit valid headers, the web server assumes that your script has failed and displays the dreaded Error 500... Internal Server Error.

There are lots of different headers you can send 4. But at a minimum, you must send a Content-Type header (in fact, in many situations this may be the only header you need to send) and you must end your list of headers with a blank line.

All headers are of the form header-type: header-value\r\n. The line ending \r\n is required to comply with the relevant RFC 5. However, most clients and servers allow just \n, which is what you'll get as a normal line ending on UNIX type systems. 6

Hello World

Let's do the obligatory "Hello, World" program as a CGI:

#!/usr/bin/python
import sys
try: 
    import cgitb
    cgitb.enable()
except ImportError:
    sys.stderr = sys.stdout

def cgiprint(inline=''):
    sys.stdout.write(inline)
    sys.stdout.write('\r\n')
    sys.stdout.flush()           

contentheader = 'Content-Type: text/html'

thepage = '''<html><head>
<title>%s</title>
</head><body>
%s
</body></html>
'''
h1 = '<h1>%s</h1>'

if __name__ == '__main__':
    cgiprint(contentheader)   # content header
    cgiprint()                       # finish headers with blank line

    title = 'Hello World'
    headline = h1 % 'Hello World'

    print thepage % (title, headline)

Let's walk through the code.

If you're running the CGI script on a Linux or Unix system, you must include the obligatory "shebang" line (#!/usr/bin/python) at line 1 to tell the script where to find Python. 7

The next part of the script is a try/except block that attempts to import the cgitb module. Normally, errors in a Python program are sent to sys.stderr. However, when running CGIs, sys.stderr translates to the server error log. But constantly digging out errors from the error log is a nuisance when debugging. Instead, cgitb pretty-prints tracebacks, including useful information like variable values, to the browser. (This module was only introduced in Python 2.2.) If the import fails, stderr redirects to stdout, which does a similar, but not so effective job. (Do not use the cgitb module in production applications. The information it displays includes details about your system that may be useful to a would-be attacker.)

Next, cgiprint() emits two header lines and properly terminates headers with the correct line endings. (cgiprint() need only be used for the header lines.) cgiprint() sends a Content-Type header. Because the script is sending a web page (which is a form of text) the type/subtype is text/html. Only one header is sent, then the headers terminate with a blank line.

cgiprint() also flushes the output buffer using sys.stdout.flush(). Most servers buffer the output of scripts until it's completed. For long running scripts, 8 buffering output may frustrate your user, who'll wonder what's happening. You can either regularly flush your buffer, or run Python in unbuffered mode. The command-line option to do this is -u, which you can specify as #!/usr/bin/python -u in your shebang line.

Finally, the script sends a small HTML page, which should look very familiar to you, if you've used HTML before.

User Interface and HTML FORMs

When writing CGIs, your user interface is the web browser. Combining Javascript, Dynamic HTML, (DHTML), and HTML forms, you can create rich web applications.

The basic HTML elements used to communicate with CGIs are forms and form input components, including text boxes, radio buttons, check boxes, pulldown menus, and the like. 9

Example Form

A typical, simple HTML form might be coded like this:

<form action="/cgi-bin/formprocessor.py" method="get">
  What is Your Name : <input name="param1" 
    type="text" value="Joe Bloggs" /><br />
  <input name="param2" type="radio" value="this"  
    checked="checked" /> Select This<br />
  <input name="param2" type="radio"  value="that" />or That<br />
  <input name="param3" type="checkbox" checked="checked" /> 
    Check This<br />
  <input name="param4" type="checkbox"  checked="checked" /> 
    and This Too ?<br />
  <input name="hiddenparam" type="hidden" value="some_value" />
  
  <input type="reset"  />
  <input type="submit" />
</form>

This translates into something like this (border added for effect):

What is Your Name :
Select This
or That
Check This
and This Too ?

When the user hits the Submit button, his (or her) form settings are encapsulated into an HTTP request. Inside the form tag are two parameters that determine how that encapsulation occurs. The action parameter is the URI of your CGI script. This is where the request is sent to. The method parameter specifies how the values are encoded into the request. The two possible methods are GET and POST.

The simpler of the two encoding choices is GET. With GET, the form's values are encoded to be "URL safe" 10 and are then added onto the end of the URL as a list of parameters. With POST, the encoded values are sent as the body of the request, after the headers are sent.

While GET is simpler, the length of URLs is limited. Hence, using GET imposes a maximum limit on the form entry that can be sent. (About 1,000 characters is the limit for many servers.) If you're using a form to get a long text entry from your form, use POST. POST is more suitable for requests where more data is being sent. 11

One advantage of GET, though, is that you can encode values yourself into a normal HTML link. This means parameters can be sent to your program without the user having to hit a submit button. An encoded set of values looks like:

param1=value1&param2=value+2&param3=value%263 

(An http GET request has this string added to the URL.) So, the whole URL might become something like http://www.someserver.com/cgi-bin/test.py?param1=value1&param2=value+2 &param3=value%263.

The ? separates the URI of your script from the encoded parameters. The & characters separate the parameters from each other. The + represents a space (which shouldn't be sent as part of a URI, of course), and the %26 is the encoded value that represents an &. & shouldn't be sent as part of a value or the CGI would think that a new parameter was being sent.

If you encode your own values into a URL, use the function urllib.encode() from the urllib module like this:

value_dict = { 'param_1' : 'value1', 'param_2' : 'value2' }
encoded_params = urllib.encode(value_dict)
full_link = script_url + '?' + encoded_params
Receiving FORM Submissions

HTML forms are encapsulated into requests in a way that equates well to Python's dictionary data type. Each form input element has a name and a corresponding value.

For instance, if the item is a radio button, the value sent is the value of the selected button. For example, in the form above, the radio button has the name param2 and its value is either this or that. For a checkbox, say param3 or param4 above, the value sent is off or on.

Now that you know the basics of how forms are encoded and sent to CGI, it's time to introduce Python's cgi module. The cgi module is your interface to receiving form submissions. It makes things very easy.

Reading form data is slightly complicated by two facts. First, form input element names can be repeated, so values can be lists. (Think of a form that allows you to check all of the answers that apply.) Second, by default, an input element that has no value -- such as a text box that hasn't been filled in -- will be missing rather than just empty.

The FieldStorage() method of the cgi module returns an object that represents the form data. It's almost a dictionary. Rather than repeat the page of the manual on using the cgi module, let's look at a couple of general purpose functions that, given an object created by FieldStorage(), do return dictionaries.

Functions
def getform(theform, valuelist, notpresent='', nolist=False):
    """
    This function, given a CGI form as a
    FieldStorage instance, extracts the
    data from it, based on valuelist
    passed in. Any non-present values are
    set to '' - although this can be
    changed. (e.g. to return None so you
    can test for missing keywords - where
    '' is a valid answer but to have the
    field missing isn't.) It also takes a
    keyword argument 'nolist'. If this is
    True list values only return their
    first value.
    """
    data = {}
    for field in valuelist:
        if not theform.has_key(field):
        #  if the field is not present (or was empty)
            data[field] = notpresent
        else: 
        # the field is present
            if  type(theform[field]) != type([]):           
            # is it a list or a single item
                data[field] = theform[field].value
            else:
                if not nolist:                               
                # do we want a list ?
                    data[field] = theform.getlist(field)     
                else:
                    data[field] = theform.getfirst(field)     
                    # just fetch the first item 
            return data

def getall(theform, nolist=False):
    """
    Passed a form (cgi.FieldStorage
    instance) return *all* the values.
    This doesn't take into account
    multipart form data (file uploads).
    It also takes a keyword argument
    'nolist'. If this is True list values
    only return their first value.
    """
    data = {}
    for field in theform.keys():                
    # we can't just iterate over it, but must use the keys() method
        if type(theform[field]) ==  type([]):
            if not nolist:
                data[field] = theform.getlist(field)
            else:
                data[field] = theform.getfirst(field)
        else:
            data[field] = theform[field].value
    return data

def isblank(indict):
    """
    Passed an indict of values it checks
    if any of the values are set. Returns
    True if the indict is empty, else
    returns False. I use it on the a form
    processed with getform to tell if my
    CGI has been activated without any
    form values.
    """
    for key in indict.keys():
        if indict[key]:
            return False
    return True 

For almost all CGIs that receive input from a form, you'll know what parameters to expect. (After all, you probably wrote the form.) If you pass the getform() function your FieldStorage() instance and a list of all parameters you expect to receive, it returns a dictionary of values. Any missing parameters have the default value '', unless you modify the notpresent keyword. If you want to make sure that you don't receive any list values, set the nolist keyword. If a form variable was a list, nolist returns only the first value in the list.

Or, if you want to retrieve all of the values sent by the form, use the getall() function above. It also accepts the optional nolist keyword argument.

isblank() is a special function: it performs a quick test to determine if allthe values in the dictionary returned by getall() or getform() is empty. If it is, the CGI was called without parameters. In that case, it's typical to generate a welcome page and a form. If the dictionary isn't blank (isblank() returns False), there's a form to process.

Using getform()

In the next article, all of these functions properly will be used to build a basic application. But to illustrate their use here, let's process a submission from the Example Form. This program snippet needs the functions above and the first few lines from Hello World.

import cgi

mainpage = '''<html><head><title>Receiving a \
  Form</title></head><body>%s</body></html>'''
error = '''<h1>Error</h1><h2>No Form Submission Was Received</h2>'''
result = '''<h1>Receiving a Form Submission</h1>
<p>We received the following parameters from the form :</p>
<ul>
    <li>Your name is "%s".</li>
    <li>You selected "%s".</li>
    <li>"this" is "%s". </li>
    <li>"this too" is "%s". </li>
    <li> A hidden parameter was sent "%s".</li>
</ul>
'''
possible_parameters = ['param1', 'param2', 'param3', 'param4', \
  'hidden_param']

if __name__ == '__main__':
    cgiprint(contentheader)   # content header
    cgiprint()                # finish headers with blank line

    theform = cgi.FieldStorage()
    
    formdict = getform(theform, possible_parameters)
    if isblank(formdict):
        body = error
    else:
        name = formdict['param1']
        radio = formdict['param2']  # should be 'this' or 'that'
        check1 = formdict['param3'] # 'on' or 'off'
        check2 = formdict['param4']
        hidden = formdict['hidden_param']
        
        body = result % (name, radio, check1, check2, hidden)

    print mainpage % body

Let's walk through this code. There are three main chunks of html: mainpage is the frame of the page, which just needs the body to be inserted into it. Then error displays if the script is called without parameters. However, if the script is called from a form submission, then the parameters are extracted and put into result.

The script prints the obligatory headers and then creates the FieldStorage instance to represent the form submission. theform is then passed to the function getform(), along with the list of expected parameters.

If no form submission was made, then all the values in the dictionary returned by getform() are blank ('' in fact). In this case isblank() returns True and body is set to be the error message.

If a form was submitted, then isblank() returns False and the values from the dictionary are extracted and inserted into result. The name variable contains the name entered into the text box. The value from the radio button (in radio) is either this or that, depending on which one was selected. check1 and check2 are either on or off, depending on whether the checkboxes were checked. The hidden parameter is always returned.

Finally, the page is printed, displaying either the error or the results. Easy, no? Using hidden values opens up the possibility of generating unique values and encoding them into the form. These could link requests together, so you can dynamically tailor the content for each user as they navigate through your application (but that's another story).

Using getall()

If the application were larger, with several possible forms, you might not know in advance exactly which parameters are going to be present. In that case, you can use getall() instead of getform(). You can then check for the presence of specific parameters and perform different actions based on which form has been submitted:

formdict = getall(theform)
if formdict.has_key('rating'):
    process_feedback(formdict)
    # user is submitting feedback
elif formdict.has_key('email'):
    subscribe(formdict)
    # user is subscribing to the email list
else:
    optionlist()
    # display a form with all the options in

Using getall(), you can actually turn our last example script into something a bit more generic and useful. :

import cgi

mainpage = '''<html><head><title>Receiving a \
  Form</title></head><body>%s</body></html>'''
result = '''<h1>Receiving a Form Submission</h1>
<p>We received the following parameters from the form :</p>
<ul>%s</ul>'''
li = "<li>%s = %s</li>"

if __name__ == '__main__':
    cgiprint(contentheader)   # content header
    cgiprint()                # finish headers with blank line

    theform = cgi.FieldStorage()    
    formdict = getall(theform)
    params = []
    for entry in formdict:
        params.append(li % (entry, str(formdict[entry])))

    print mainpage % (result % ''.join(params))

This code gets all the parameters submitted to it using getall(). It then inserts them into the page as an unordered list. If you send this script a form submission, the page it displays shows you all the parameters received, where each line will look like parameter = value. Because the line of code that produces this uses the str() function for each value, it can cope with list values.

A List of Values

As mentioned before, it's possible for different parameters in the form to have the same name. In this case, the value returned in the FieldStorage is a list. You could use this to gather information from your user. For example, a list of areas they are interested in for newsletters you may be sending out:

<form action="/cgi-bin/formprocessor.py" method="get">
  What is Your Name : <input name="name" 
    type="text" value="Joe Bloggs" /><br />
  Email Address : <input name="email" 
    type="text" /><br />
  <input name="interests" type="checkbox"  value="computers" />Computers<br />
  <input name="interests" type="checkbox"  value="sewing" />Sewing<br />
  <input name="interests" type="checkbox"  value="ballet" />Ballet<br />
  <input name="interests" type="checkbox"  value="scuba" />Scuba Diving<br />
  <input name="interests" type="checkbox"  value="cars" />Cars<br />

  <input type="reset"  />
  <input type="submit" />
</form>

When the form above is submitted, it will have a value for the users name, their email address, and a list of all the interests they checked. The code to directly fetch the value from the FieldStorage instance is:

import cgi
theform = cgi.FieldStorage()
interests = theform['interests'].value

The difficulty have is that if the user only checks one choice, then interests is a single value rather than the list we are expecting. The alternative is to use the higher level methods available in FieldStorage.

The getlist() method always returns a list, even if only a single value was supplied. If no boxes at all were checked, it returns an empty list.

import cgi
theform = cgi.FieldStorage()
interests = theform.getlist('interests')    

It would be very easy to adapt the getform() and getall() functions to your particular needs when dealing with values that you expect to be lists.

Experimenting Yourself

You don't need an online server to test CGIs. You can code and debug web applications on your local machine, which is good news for those who still pay for Internet access by the minute. With a server running as localhost on your own machine, you can perform the "code, test, tear out hair, debug, and repeat" cycle from the comfort of your own armchair. Try Xitami_. It's a fast and lightweight web server, particularly for the Windows platform.

You need to take care when setting up the CGI on the server. It's not difficult, but there are several steps that must be done.

If the script is going on another server, rather than your own machine, you will probably have to upload it to the server with FTP. Your FTP client must be set to upload Python scripts as text. Once copied to the right directory, set the permissions correctly for it to run. Be sure to also the proper path in the shebang line for the server. (See The Error 500 Checklist section for a few other pitfalls.)

You can find a web page full of CGI examples at http://www.voidspace.org.uk/python/cgi.shtml. These are available to test or download. They include an online anagram generator and various smaller test scripts. There is also a complete framework for doing user authentication and management from CGI, called logintools.

The Error 500 Checklist

Debugging CGIs can be frustrating. By default any problem with your CGI script results in the anonymous error 500. Actual details of the error are written into the server log, which can be helpful, if you can get access to the log.

However, more than half of 500 errors can be easily solved by checking the following common sources of mistakes. You'd be surprised by how often one of these basic gotchas will getcha!

  • Was your script uploaded to the server in 'text' mode? (Is your FTP client set to recognize .py files as text?)
  • Have you set the script permissions to mode 755 (executable by everyone)? 12
  • Have you set the path to Python in the first line correctly?
  • Did you print valid header lines, including the final blank line?
  • Finally, some servers require the script to be in the cgi-bin folder (or a subdirectory) and some even require the file extension to be .cgi rather than .py.
Conclusion

We've covered all the basics of CGI. The information here is enough to get you up and running, and at least looking in the right direction for information.

There's a lot more though: character encoding, using templates to output similar HTML code repeatedly, and finding out about the HTTP request the user sent, to mention just a few topics.

In the next part of "Python at the Other End of the Web," we'll touch on these subjects when we use what we've learnt so far to build an example application.

[1]The full CGI specification can be found at http://hoohoo.ncsa.uiuc.edu/cgi/interface.html
[2]There's actually no requirement for URLs to map directly to files, but for static content it's the obvious way of doing it.
[3]One common alternative is to embed an interpreter into your server, for example using Apache with mod_python. This means that the interpreter doesn't have to restart in between requests. This can also make session management easier. It does introduce a whole host of other problems of course. Another alternative is to use a special application server like Zope.
[4]Quick Reference to HTTP Headers http://www.cs.tut.fi/~jkorpela/http.html
[5]The RFC stating that headers end '\r\n' is the very long RFC 2616. See http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html
[6]Most Python CGIs will run on Linux type servers. If header lines are sent using the normal 'print' command then they will be terminated with '\n'. This is technically invalid, but usually won't matter.
[7]#!/usr/bin/python is one of the more common ones. #!/usr/bin/env python and #!/usr/local/bin/python are also common. It is likely that one of these will work.
[8]On shared hosting accounts, CGI scripts are likely to be restricted to a maximum running time of 60 seconds or even 30 seconds. After this, the server usually kills them. If you use your own server this won't be a problem of course. Using something like mod_python may be a way round this CGI restriction, or you can code around it by "chaining" requests.
[9]There is a good forms tutorial at http://www.csd.abdn.ac.uk/~apreece/teaching/CS1009/practicals/forms.html (It's HTML rather than XHTML, but it's still a nice reference.)
[10]The RFC defining URL encoding is RFC 1738. See http://rfc.net/rfc1738.html
[11]As well as sending the values from forms in a POST request, it is also possible to allow file uploads. This allows your user to select a file from their local hard drive and encode into their request. An example CGI that can receive file uploads (including the HTML form needed to allow it) can be found at http://www.voidspace.org.uk/python/cgi.shtml#upload
[12]On a Linux server the script is run as nobody. This means that the script must be executable by everybody or it won't run as a CGI. It also means that any files it needs to access/write over must be readable/writable by everybody.

Michael Foord
This article is the Copyright of beehive KG and Py - the Python Technical Journal (2003-2005)
http://www.pyzine.com


Support great free Python articles by becoming a Py subscriber!